PAVE: Write-print Creation with MapReduce

نویسندگان

  • Leo St. Amour
  • Frederick Ulrich
  • Andreas Kellas
  • Alexander Molnar
  • Suzanne J. Matthews
چکیده

Cyber-crime is becoming alarmingly common through the use of anonymous e-mails. Author attribution helps digital forensics investigators filter through a large set of possible authors and focus traditional investigative techniques on the most probable culprits. A recent promising technique is to construct a write-print for each known author and compare it to the write-print extracted from the anonymous message(s). A write-print is a unique digital fingerprint created by mining frequent patterns from a particular author’s writing style. Parallel computing enables us to leverage multiple cores in the creation of author write-prints. We introduce Parallel Author Verification of E-mail (PAVE), a MapReduce algorithm for generating author write-prints in parallel. Our algorithm is able to achieve up to 90% accuracy when tested on a subset of the Enron dataset. We believe the community will find the PAVE system useful to expedite author identification in time sensitive situations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

I/O Efficient Implementation of MapReduce

MapReduce is a programming model and an associated implementation used by Google for processing their massive data sets. It has a simple yet powerful interface that is amenable to a broad variety of problems. Since 2003, when the MapReduce framework was first created, more than ten thousand distinct programs have been implemented under this model. A large number of MapReduce tasks are now runni...

متن کامل

VC3: Trustworthy Data Analytics in the Cloud

We present VC3, the first system that allows users to run distributed MapReduce computations in the cloud while keeping their code and data secret, and ensuring the correctness and completeness of their results. VC3 runs on unmodified Hadoop, but crucially keeps Hadoop, the operating system and the hypervisor out of the TCB; thus, confidentiality and integrity are preserved even if these large ...

متن کامل

Simplifying the Development and Deployment of MapReduce Algorithms

MapReduce algorithms can be difficult to write and test due to the accidental complexities involved with existing MapReduce implementations. Furthermore, the configuration details involved in running MapReduce algorithms within a cloud present a set of new challenges. Our research reveals that many details of cloud configuration can be hidden from programmers in an automated and transparent man...

متن کامل

Social Networks Mining for Analysis and Modeling Drugs Usage

This paper presents approach for mining and analysis of data from social media which is based on using Map Reduce model for processing big amounts of data and on using composite applications for performing more sophisticated analysis which are executed on environment for distributed computingbased cloud platform. We applied this system for creation characteristics of users who write about drugs...

متن کامل

Parallelization of Maximum Entropy POS Tagging for Bahasa Indonesia with MapReduce

In this paper, MapReduce programming model is used to parallelize training and tagging proceess in maximum entropy part of speech tagging for Bahasa Indonesia. In training process, MapReduce model is implemented dictionary, tagtoken, and feature creation. In tagging process, MapReduce is implemented to tag lines of document in parallel. The training experiments showed that total training time u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015